NORTH- HOUAND Construction of Bayesian Network Structures From Data: A Brief Survey and an Efficient Algorithm*

نویسندگان

  • Moninder Singh
  • Marco Valtorta
چکیده

Previous algorithms for the recovery of Bayesian belief network structures from data have been either highly dependent on conditional independence (CI) tests, or have required on ordering on the nodes to be supplied by the user. We present an algorithm that integrates these two approaches: CI tests are used to generate an ordering on the nodes from the database, which is then used to recover the underlying Bayesian network structure using a non-Cl-test-based method. Results of the evaluation of the algorithm on a number of databases (e.g., ALARM, LED, and SOYBEAN) are presented. We also discuss some algorithm performance issues and open problems. K E Y W O R D S : Bayesian networks, probabilistic networks, probabilistic model construction, conditional independence 1. I N T R O D U C T I O N In very general terms, different methods of learning probabilistic network structures f rom data can be classified into three groups. Some of these methods are based on linearity and normali ty assumptions [2, 3]; others are more general but require extensive testing o f independence relations [4-8]; others yet take a Bayesian approach [9-12]. Address correspondence to Professor Marco Valtorta, Department of Computer Science, The University of South Carolina, Columbia, SC 29208. E-mail." msingh@gradient .c i s . upenn, edu or mgv@usceast, cs. scarolina.edu. *A preliminary version of this paper was presented in [1]. *Current address is the Department of Computer and Information Science, University of Pennsylvania, 200 S 33rd St., Philadelphia, PA 19104. Received May 1994; accepted September 1994. International Journal of Approximate Reasoning 1995; 12:111-131 © 1995 Elsevier Science Inc. 0888-613X/95/$9.50 655 Avenue of the Americas, New York, NY 10010 SSDI 0888-613X(94)00016-V 112 Moninder Singh and Marco Valtorta In this paper, we do not consider methods of the first kind, namely, those that make linearity and normality assumptions. Our work concentrates on CI-test-based methods and Bayesian methods. A number of algorithms have been designed which are based on CI tests. However, there are two major drawbacks of such algorithms. Firstly, the CI test requires determining independence relations of order n 2, in the worst case. "Such tests may be unreliable, unless the volume of data is enormous" [10, p. 332]. Also, as Verma and Pearl [5, p. 326-327] have noted, "in general, the set of all independence statements which hold for a given domain will grow exponentially as the number of variables grow." Thus, CI-test-based approaches rapidly become computationally infeasible as the number of vertices increases. Spirtes and Glymour [6, p. 62] have presented "an asymptotically correct algorithm whose complexity for fixed graph connectivity increases polynomially in the number of vertices, and may in practice recover sparse graphs with several hundred variables"; but for dense graphs with limited data, the algorithm might be unreliable [10]. On the other hand, Cooper and Herskovits [10] have given a Bayesian non-CI-test-based method, which they call the BLN (Bayesian learning of belief networks) method. Given that a set of four assumptions hold [10, p. 338]--namely, (i) the database variables are discrete, (ii) cases occur independently, given a belief network model, (iii) all variables are instantiated to some value in every case, and finally (iv) before observing the database, we are indifferent regarding the numerical probabilities to place on the belief network structure--Cooper and Herskovits [10] have shown the following result: THEOREM 1 (Due to Cooper and Herskovits [10]) Consider a set Z o f n discrete variables. Each variable x i E Z has r i possible value assignments: (vii . . . . . vir). Le t D be a database o f m complete cases, i.e., each case contains a value assignment for each variable in Z. Let B s denote a belief-network structure containing just the variables in Z. Each variable x i in B s has a set o f parents 7r i. Le t wij denote the j th unique instantiation o f zri relative to D, and suppose there are qi such unique instantiations o f zr~. Le t Nij k be the number o f cases in D in which x i is instantiated to vit, while 77" i is instantiated to wq. Let N/j = E~'= a N/jt`. Then

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Construction of Bayesian network structures from data: A brief survey and an efficient algorithm

Previous algorithms for the recovery of Bayesian belief network structures from data have been either highly dependent on conditional independence (CI) tests, or have required an ordering on the nodes to be supplied by the user. We present an algorithm that integrates these two approaches-CI tests are used to generate an ordering on the nodes from the database which is then used to recover the ...

متن کامل

Construction of Bayesian Network Structures from Data: a Brief Survey and an E cient Algorithmy

Previous algorithms for the recovery of Bayesian belief network structures from data have been either highly dependent on conditional independence (CI) tests, or have required an ordering on the nodes to be supplied by the user. We present an algorithm that integrates these two approaches-CI tests are used to generate an ordering on the nodes from the database which is then used to recover the ...

متن کامل

Structural Reliability: An Assessment Using a New and Efficient Two-Phase Method Based on Artificial Neural Network and a Harmony Search Algorithm

In this research, a two-phase algorithm based on the artificial neural network (ANN) and a harmony search (HS) algorithm has been developed with the aim of assessing the reliability of structures with implicit limit state functions. The proposed method involves the generation of datasets to be used specifically for training by Finite Element analysis, to establish an ANN model using a proven AN...

متن کامل

An Irregular Lattice Pore Network Model Construction Algorithm

Pore network modeling uses a network of pores connected by throats to model the void space of a porous medium and tries to predict its various characteristics during multiphase flow of various fluids. In most cases, a non-realistic regular lattice of pores is used to model the characteristics of a porous medium. Although some methodologies for extracting geologically realistic irregular net...

متن کامل

Robust Opponent Modeling in Real-Time Strategy Games using Bayesian Networks

Opponent modeling is a key challenge in Real-Time Strategy (RTS) games as the environment is adversarial in these games, and the player cannot predict the future actions of her opponent. Additionally, the environment is partially observable due to the fog of war. In this paper, we propose an opponent model which is robust to the observation noise existing due to the fog of war. In order to cope...

متن کامل

A Model for Tax Evasion Forcasting based on ID3 Algorithm and Bayesian Network

Nowadays, knowledge is a valuable and strategic source as well as an asset for evaluation and forecasting. Presenting these strategies in discovering corporate tax evasion has become an important topic today and various solutions have been proposed. In the past, various approaches to identify tax evasion and the like have been presented, but these methods have not been very accurate and the ove...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995